Introduction

The data set referenced in this script is generated from the American Community Survey (ACS) and the Washington Office of Superintendent of Public Instruction (OSPI). These data sets provide data at the person-level, with the ability to look at the different indicators by the six equity demographic groups of interest.

Access data

PUMS and OSPI data (Elmer)

This data set was compiled from PUMS data.

Looking at the fields in the data set
## [1] "Disability_cat" "Income_cat"     "LEP_cat"        "Older_cat"     
## [5] "POC_cat"        "Youth_cat"      "Total"
##  [1] "educational_attainment"  "healthcare_coverage"    
##  [3] "median_household_income" "household_poverty"      
##  [5] "median_gross_rent"       "crowding"               
##  [7] "SNAP"                    "internet_access"        
##  [9] "Kindergarten readiness"  "tenure"                 
## [11] "rent_burden"
##  [1] 2021 2022 2019 2018 2017 2016 2015 2014 2013 2012 2011

Median household income

1. Explore data

In this section we make sure that the data set makes sense.

Data fields

consistent base data

  • There should be 5 geographies - the 4 counties and the Region.
  • There should be 6 equity focus categories - POC, income, disability, youth, older adult, and LEP
    • 2 sub-categories per focus group (e.g. people of color, non-people of color)
## [1] "Region"    "King"      "Kitsap"    "Pierce"    "Snohomish"
## [1] "Disability_cat" "Income_cat"     "LEP_cat"        "Older_cat"     
## [5] "POC_cat"        "Youth_cat"



indicator-specific data

These fields will vary by indicator:

  • Type of metric - this will determine how the data are visualized (est =“percent” or “currency” or “number”)
  • Number of years (5-year span) - this can vary depending on data availability
  • Number of indicator-specific categories - this can vary depending on the indicator of interest, ranging from N/A (median income) to multiple levels (crowding, housing cost burden)
## [1] "median"
## [1] 2021 2016 2011
## [1] "N/A"

There are 5 geographies and 6 equity focus groups (each with 2 subgroups). There are 3 years in the data set and the indicator specific field has 1 attribute(s), which means there should be a total of 180 rows.

## [1] 170

There are some missing data.

checking for missing data

Year / geography

If we look at the data by year and geography, there should be 12 entries per year/geography.

2011 is missing some data for all geographies.

Year / equity focus group

If we look at the data by year and focus group, there should be 10 entries per year/focus group.

##       
##        Disability_cat Income_cat LEP_cat Older_cat POC_cat Youth_cat
##   2011              0         10      10        10      10        10
##   2016             10         10      10        10      10        10
##   2021             10         10      10        10      10        10

The disability category is missing all data in 2011.

Year / equity focus sub-group

If we look at the data by year and focus sub-group, there should be 5 entries per year/focus sub-group.

##       
##        English proficient Household with older adult Household with youth
##   2011                  5                          5                    5
##   2016                  5                          5                    5
##   2021                  5                          5                    5
##       
##        Household without older adult Household without youth
##   2011                             5                       5
##   2016                             5                       5
##   2021                             5                       5
##       
##        Limited English proficiency Low Income Non-Low Income Non-POC POC
##   2011                           5          5              5       5   5
##   2016                           5          5              5       5   5
##   2021                           5          5              5       5   5
##       
##        With disability Without disability
##   2011               0                  0
##   2016               5                  5
##   2021               5                  5



Year / indicator attribute

If we look at the data by year and indicator attribute, there should be 60 entries per year/indicator attribute.

##       
##        N/A
##   2011  50
##   2016  60
##   2021  60



Numeric data

To check for 0s and NULLs

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22853   65734   84481   83492  102747  166615

There are no 0s or nulls.

To look at distribution of all data - not the most useful visual, but provides a sense of the range of values at a high level in one plot.

This table includes a lot of information about the data set and helps to show the different levels of each field. It provides another way to check if data are available for all counties and all years, or where there may be gaps in the data set.


Data labels, shares

These charts were generated to ensure the labels across years are consistent/make sense. There had been an issue with misassigned labels because tidycensus::pums_variables, i.e. the only digital data dictionary available to associate labels with codes, exists only from 2017 forward. Most variables have had consistent codes, but in cases where the codes have shifted over time, using the 2017 lookup winds up mischaracterizing categories.

These charts also help to confirm that the shares add up to 100% - only relevant when indicator_attribute has more than one category. The indicator_attribute for median household income is NA.

The colors of the charts may not be consistent between the years depending on missing data.

2. Visually explore data

2a. Scatter plots

In this section we start to explore the data visually - distribution by the different dimensions within the data set. These plots are helpful to check for outliers and get a higher level understanding of the data in one visual, before slicing the data by geography and equity focus group in the following sections.

The following code will need to be adjusted to fit the fields specific to the data indicator. For educational attainment, we focus on those with a Bachelor’s degree or higher. The following code establishes the data frame that the rest of the analysis uses. If there are fewer than 2 indicator attributes, this section can be skipped/commented out, but the code will need to be adjusted throughout.

By indicator_attribute

This section isn’t relevant for this specific indicator because there aren’t unique indicator attributes.

By Year


2b. Facets by geography

In this section we explore trends by different groups with MOEs. These charts help to show any missing data by geography, year, or focus group/subgroup.



3. Developing visuals

In this section we further develop the draft visuals for communicating the results and supporting the narrative for the Equity Tracker webpages. These charts are slightly more refined by slicing the data by geography and equity focus group. The line charts don’t include MOEs, but they help make connections between the same groups over time.

Line charts by geography

High / low vulnerability groups



First / last years



calculated difference b/t

The 5 geographies are all included in the facets by geography, but they could be separated out to create 5 individual charts - one for each geography.

Line charts by equity group

High / low vulnerability groups



First / last years



calculated difference b/t

The 6 equity focus groups are all included in the facets by geography, but they could be separated out to create 6 individual charts - one for each focus group.

Cleveland dot plot

Resource for visual
The code to make this is type of visual is long - adjust to indicator as needed (scale_x_continuous, labs, label, etc).


4. Save files

This section needs to be edited. Keep the code chunks commented out for now as we draft and refine the visuals.

PNG

HTML

Copy files from Y drive > website folder



5. Archive

This section includes visuals that were determined to be less useful. We didn’t want to lose the work, but didn’t want to include it in the main workflow. Feel free to comment out if you don’t want to adjust the arguments to fit the indicator of interest.

Line chart: all categories

Line chart: by vulnerability



3 visuals for webpage

1. Map of most recent data

2. Facet chart of most recent

There are five charts for the different geographies: Region and the 4 counties.

3. Time series

Line chart

By geography

There are 5 charts for the different geographies: Region and the 4 counties.

All years

First/last years

There are 5 charts for the different geographies: Region and the 4 counties.

By equity group

There are 6 charts for the different equity groups: POC, low-income, etc.

All years

First/last years

There are 6 charts for the different equity groups: POC, low-income, etc.

Cleveland dot plot



Back to top of the page